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fer to these rules as onto-relational since they combine DL-based ontology lan- 
guages and Knowledge Representation formalisms supporting the relational data 
model within the tradition of Logic Programming and Deductive Databases. Rule 
authoring is a very demanding Knowledge Engineering task which can be auto- 
mated though partially by applying Machine Learning algorithms. In this chapter 
we show how Inductive Logic Programming (ILP), bom at the intersection of Ma- 
chine Learning and Logic Programming and considered as a major approach to 
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illustration, we provide details of a specific Onto-Relational Learning solution to 
the problem of learning rule-based definitions of DL concepts and roles with ILP. 
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Introduction 

Rules are widely used in Knowledge Engineering (KE) and Knowledge Representation 
(KR) as a powerful way of modeling knowledge. In the broadest sense, a rule could be 
any statement which says that a certain conclusion must be valid whenever a certain 
premise is satisfied, i.e. any statement that could be read as a sentence of the form "if .. 
then Rules have been successfully applied in the fields of Logic Programming (LP) 
and Deductive Databases [6J. Rules play also a role in the Semantic Web architecture. 
Interest in this area has grown rapidly over recent years as testified by the Rules In- 
terchange Format (RIFil activity at W3C. Rules from the RIF perspective would allow 
the integration, transformation and derivation of data from numerous sources in a dis- 
tributed, scalable, and transparent manner. Because of the great variety in rule languages 
and rule engine technologies, RIF consists of a core languag^ to be used along with a set 
of standard and non-standard extensions. These extensions need not all be combinable 
into a single unified language. As for the expressive power, two directions are followed: 
monotonic extensions towards full First Order Logic (FOL) and non-monotonic (NM) 
extensions based on the LP tradition. The debate around a RIF has taken a long time also 
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due to the controversial issue of having rules on top or aside ontologies |fT9l . There is 
a consensus now on the fact that rules complement and extend ontologies. Indeed, rules 
can be used in combination with ontologies, or as a means to specify ontologies. They 
are also frequently applied over ontologies, to draw inferences, express constraints, spec- 
ify policies, react to events, discover new knowledge, transform data, etc. In particular, 
RIF rules can refer to RDF and OWL facts. Since the design of OWL has been based 
on the SH family of very expressive Description Logics (DLs) (see Chapter ?? for an 
introduction), the NM dialects of RIF will most likely be inspired by those hybrid KR 
systems that integrate DLs and LR Such rule formalisms are of interest to this chapter. 
We shall refer to them as onto-relational rule languages from now on. Apart from the 
specific ontology language, the integration of ontologies and rules is already present in 
existing knowledge bases (KBs). Notably the Cyc0KB consists of terms (which consti- 
tute the vocabulary, i.e. the ontology) and assertions which relate those terms and include 
both simple ground assertions and rules ll22l . 

The acquisition of rules for very large KBs like Cyc is a very demanding KE activ- 
ity. Indeed, according to an estimate from the Cyc project, human experts produce rules 
at the rate of approximately three per hour but can evaluate an average of twenty rules 
per hour Also, for untrained knowledge engineers, while rule authoring may be very 
difficult, rule reviewing is feasible (although still difficult). A partial automation of the 
rule authoring task, e.g.hy applying Machine Learning (ML) algorithms (see Chapter ?? 
for an introduction), can be of help even though the automatically produced rules are 
not guaranteed to be correct. In fact, of those rules, some will turn out to be correct, and 
some will be found to need editing to be assertible. Yet, as mentioned above, rule re- 
viewing is less critical than rule authoring. In order to partially automate the authoring of 
onto-relational rules, the bunch of ML techniques collectively known under the name of 
Inductive Logic Programming (ILP) |40| seems particularly promising for the following 
reasons. ILP was born at the intersection of ML and LP |39|, and is widely recognized 
as a major approach to Relational Learning |7|. Apart from the KR framework of LP, 
the distinguishing feature of ILP, also with respect to other ML forms, is the use of prior 
domain knowledge in the form of a logical theory during the induction process. In this 
chapter we take a critical look at ILP proposals for learning relational rules while having 
an ontology as the background theory. These proposals try to overcome the difficulties 
of accommodating ontologies in Relational Learning. The work of |3 | on using seman- 
tic meta-knowledge from Cyc as inductive bias in an ILP system is another attempt at 
solving this problem though more empirically. Conversely, we promote an extension of 
Relational Learning, called Onto-Relational Learning (ORL), which accounts for on- 
tologies in a clear, elegant and well-founded manner by resorting to onto-relational rule 
languages. In this chapter, for the sake of illustration, we provide details of a specific 
ORL solution to the problem of learning rule-based definitions of DL concepts and roles 
with ILP 

The chapter is organized as follows. Section[T]is devoted to preliminaries on LP and 
its applications to databases and ontologies as well as on ILP. Section|2]provides a state- 
of-the-art survey of ILP proposals for learning onto-relational rules. Section[3]describes 
in depth the most powerful of these proposals. Section|4]concludes the chapter with final 
remarks and outlines directions of future work. 
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1. Preliminaries 



1.1. Logic Programming and databases 

Logic Programming (LP) is rooted into a fragment of Clausal Logics (CLs) known as 
Horn Clausal Logic (HCL) p4j. The basic element in CLs is the atom of the form 
p{ti, . . . ,tki) such that each p is a predicate symbol and each tj is a term. A term is 
either a constant or a variable or a more complex term obtained by applying a functor 
to simpler term. Constant, variable, functor and predicate symbols belong to mutually 
disjoint alphabets. A literal is an atom either negated or not. A clause is a universally 
quantified disjunction of literals. Usually the universal quantifiers are omitted to simplify 
notation. Alternative notations are a clause as set of literals and a clause as an implica- 
tion. A program is a set of clauses. HCL admits only so-called definite clauses. A definite 
clause is an implication of the form 

ao «!, . • . 

where to > and are atoms, i.e. a clause with exactly one positive literal. The right- 
hand side ao and the left-hand side ai, . . . , am of the implication are called head and 
body of the clause, respectively. Note that the body is intended to be an existentially 
quantified conjunctive formula 3ai A ... A am- Furthermore definite clauses with to > 
and TO = are called rules and facts respectively. The model-theoretic semantics of 
HCL is based on the notion of Herbrand interpretation, i.e. an interpretation in which all 
all constants and function symbols are assigned very simple meanings. This allows the 
symbols in a set of clauses to be interpreted in a purely syntactic way, separated from any 
real instantiation. The corresponding proof-theoretic semantics is based on the Closed 
World Assumption (CWA), i.e. the presumption that what is not currently known to be 
true, is false. Deductive reasoning with HCL is formalized in its proof theory. In clausal 
logic resolution comprises a single inference rule which, from any two clauses having 
an appropriate form, derives a new clause as their consequence. Resolution is sound: ev- 
ery resolvent is implied by its parents. It is also refutation complete: the empty clause 
is derivable by resolution from any set S of Horn clauses if S is unsatisfiable. Negation 
As Failure (NAF) is related to the CWA, as it amounts to believing false every predicate 
that cannot be proved to be true. Clauses with NAF literals in the body are called normal 
clauses. The concept of a stable model, or answer set, is used to define a declarative 
semantics for normal logic programs 1 16|. According to this semantics, a logic program 
may have several alternative models (but possibly none), each corresponding to a possi- 
ble view of the reality. Also based on the stable model (answer set) semantics. Answer 
Set Programming (ASP) is an alternative LP paradigm oriented towards difficult search 
problems ll35ll . 

Definite clauses played a prominent role in the rise of deductive databases f6\. More 
precisely, functor-free non-recursive definite clauses are at the basis of the language Dat- 
alog for deductive databases Q. Generally, it is denoted by Datalog^ where ^ is 
treated as NAF. The restriction of Datalog to only positive rules (i.e., rules without NAF 
literals) is denoted by DATALOG. Based on the distinction between extensional and in- 
tensional predicates, a DATALOG program 11 can be divided into two parts. The exten- 
sional part, denoted as EDB{ir), is the set of facts of 11 involving the extensional predi- 
cates, whereas the intensional part IDB{IV) is the set of all other clauses of 11. The main 



reasoning task in Datalog is query answering. A query Q to a Datalog program 11 
is a Datalog clause of the form 

i Ol , . . . , (Xyyi 

where to > 0, and is a DATALOG atom. An answer to a query Q is a substitution 9 
for the variables of Q. An answer is correct with respect to the DATALOG program 11 if 
n 1= QO. The answer set to a query Q is the set of answers to Q that are correct w.rt. 11 
and such that QO is ground. In other words the answer set to a query Q is the set of all 
ground instances of Q which are logical consequences of 11. Answers are computed by 
refutation. 

Disjunctive Datalog (denoted as DATALOG^) is a variant of DATALOG where dis- 
junctions may appear in the rule heads ifTOil . Therefore DATALOG^ can not be considered 
as a fragment of HCL. Advanced versions (Datalog^^) also allow for negation in the 
bodies, which can be handled according to a semantics for negation in CLs. Defining the 
semantics of a Datalog^^ program is complicated by the presence of disjunction in 
the rules' heads because it makes the underlying disjunctive logic programming inher- 
ently nonmonotonic, i.e. new information can invalidate previous conclusions. Among 
the many alternatives, one widely accepted semantics for DATALOG"^ is the extension 
of the stable model semantics to the disjunctive case. 

1.2. Logic Programming and ontologies 

The integration of LP and ontologies follows the tradition of KR research on so-called 
hybrid systems, i.e. those systems which are constituted by two or more subsystems deal- 
ing with distinct portions of a single KB by performing specific reasoning procedures 
ifTSl . The motivation for investigating and developing such systems is to improve on 
two basic features of KR formalisms, namely representational adequacy and deductive 
power, by preserving the other crucial feature, i.e. decidability . Indeed DLs and CLs are 
FOL fragments incomparable as for the expressiveness [jj and the semantics |43 | but 
combinable at different degrees of integration: Tight, loose, full. 

The semantic integration is tight when a model of the hybrid KB is defined as the 
union of two models, one for the DL part and one for the CL part, which share the same 
domain. In particular, combining DLs with CLs in a tight manner can easily lead to unde- 
cidability if the interaction scheme between the DL and the CL part of a hybrid KB does 
not solve the semantic mismatch between DLs and CLs |44|. This requirement is known 
as DL-safety |[38l. With respect to this property, the hybrid KR system Carin |23l is 
unsafe because the interaction scheme is left unrestricted. Conversely, AC-WG [S] guar- 
antees a safe interaction scheme by means of syntactic restrictions. Finally, VC+WG^^ 
||45]|^ is weakly DL-safe because it relaxes the condition of DL-safety. The distinguishing 
features of these three KR frameworks are summarized in Table [T] and further discussed 
in Sec tion fT. 2. 1111.2.21 and ll.2.3l respectivelv. 

The semantic integration is loose when the DL part and the CL part are separate 
components connected through a minimal interface for exchanging knowledge. An ex- 
ample of one such kind of coupling is the integration scheme for ASP and DLs illustrated 

^We prefer I'£+LOG^^ to the original name 'DC+hOG in order to empliasize tlie NM features of tlie lan- 
guage. 



Table 1. Three KR frameworks suitable for representing onto-relational rules. 





Carin [23] 


^£-Lor, 8 




DL language 
CL language 


any DL 
Horn clauses 


ACC 

DATALOG clauses 


any DL 

Datalog ""^ clauses 


integration 
rule head literals 
rule body literals 


tight DL-unsafe 
DL/Horn literals 
DL/Horn literals 


tight DL-safe 
Datalog literal 

>i£C/DATALOG literals (no roles) 


tight weakly DL-safe 
DL/DataloG literals 
DL/Datalog~' literals 


semantics 
reasoning 

decidability 


Herbrand models+DL models 
SLD-resolution+tableau calculus 

only for some instantiations 


idem 
idem 

yes 


stable models+DL models 
stable model computation + 
Boolean CQ/UCQ containment 
for all instantiations with DLs for 
which the Boolean CQ/UCQ con- 
tainment is decidable 


implementation 


yes, e.g. 18 


yes,f'.g."48 


unknown 



in lITTl . It derives from the previous work of the same authors on the extension of ASP 
with higher-order reasoning and external evaluations |[T2l which has been implemented 
into the system DLVHEJ^H 

The semantic integration is, full when there is no separation between vocabularies of 
the two parts of the hybrid KB. One such kind of coupling is achieved by means of the 
logic of Minimal Knowledge and Negation as Failure in [37|. 

A complete picture of the computational properties of systems combining DL on- 
tologies and Datalog rules can be found in |46|. An updated survey of the literature on 
hybrid DL-CL systems is suggested for further reading. 

1.2.1. Carin 

A comprehensive study of the effects of combining DLs and CLs (more precisely, Horn 
rules) can be found in [23J. Special attention is devoted to the DL ACCMTZ. The re- 
sults of the study can be summarized as follows: (i) answering conjunctive queries over 
ACCMTZ TBoxes is decidable, (ii) query answering in ACCAfTZ extended with non- 
recursive Datalog rules, where both concepts and roles can occur in rule bodies, is 
also decidable, as it can be reduced to answering a union of conjunctive queries (UCQfl 
(iii) if rules are recursive, query answering becomes undecidable, (iv) decidability can be 
regained by disallowing certain combinations of constructors in the logic, and (v) decid- 
ability can be regained by requiring rules to be role-safe, where at least one variable from 
each role literal must occur in some non-DL-atom. The integration framework proposed 
in ll23l and known as Carin is therefore DL-unsafe. Reasoning in Carin is based on 
constrained SLD- resolution, i.e. an extension of SLD-resolution with a tableau calculus 
for DLs to deal with DL literals in the rules. Constrained SLD-refutation is a complete 
and sound method for answering ground queries. 
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'a UCQ over a predicate alphabet P is a FOL sentence of the form 3X .conj-i{X) V ... V conj„{X), 
where J? is a tuple of variable symbols and each conji ( Jf ) is a set of atoms whose predicates are in P and 
whose arguments are either constants or variables from X. A CQ is a UCQ with n = 1. 



7.2.2. AC-LOG 



^>C-LOG is a hybrid KR system that integrates safely the DL ACC and CATALOG [ISl. In 
particular, variables occurring in the body of rules may be constrained with ACC concept 
assertions to be used as 'typing constraints'. This makes rules applicable only to exphc- 
itly named objects. As in Carin, query answering is decided using the constrained SLD- 
resolution which however in AC-LOG is decidable and runs in single non-deterministic 
exponential time. 

1.2.3. VC+LOG^'^ 

The hybrid KR framework of P^C+LOG^^ allows a VC KB, i.e. a KB expressed in any 
DL, to be extended with weakly DL-safe Datalog"^ rules [45 1. Weak DL-safeness 
allows to overcome the main representational limits of the DL-safe approaches, e.g. the 
possibility of expressing UCQs, by keeping the integration scheme still decidable. For 
VC-^-LOG"'^ two semantics have been defined: a FOL semantics and a NM semantics. In 
particular, the latter extends the stable model semantics of Datalog^^. According to 
it, P£-predicates are still interpreted under OWA, while DATALOG-predicates are inter- 
preted under CWA. Notice that, under both semantics, entailment can be reduced to sat- 
isfiability and, analogously, that CQ answering can be reduced to satisfiability. The prob- 
lem statement of satisfiability for finite I?£h-LOG^^ KBs relies on the problem known as 
the Boolean CQ/UCQ containment problen% in VC. It is shown that the decidability of 
reasoning in 'DC-\-LOG~'^ , thus of ground query answering, depends on the decidability 
of the Boolean CQ/UCQ containment problem in VC. Currently, SHIQ is one of the 
most expressive DLs for which this problem is decidable [ 17 |. 

1.3. Inductive Logic Programming 

Inductive Logic Programming (ILP) was born at the intersection between LP and ML 
139 1. From LP it has borrowed the KR framework, i.e. HCL. From ML (more precisely, 
from Concept Learning) it has inherited the inferential mechanisms for induction, the 
most prominent of which is generalization. However, a distinguishing feature of ILP 
with respect to other forms of Concept Learning is the use of prior knowledge of the 
domain of interest, called background knowledge (BK). Therefore, induction with ILP 
generalizes from individual instances/observations in the presence of BK, finding valid 
hypotheses. Validity depends on the underlying setting. At present, there exist several 
formalizations of induction in ILP that can be classified according to the following two 
orthogonal dimensions: the scope of induction (discrimination vs characterization) and 
the representation of observations (ground definite clauses vs ground unit clauses). Dis- 
criminant induction aims at inducing hypotheses with discriminant power as required in 
tasks like classification. In classification, observations encompass both positive and neg- 
ative examples. Characteristic induction is more suitable for finding regularities in a data 
set. This corresponds to learning from positive examples only. The second dimension 
affects the notion of coverage, i.e. the condition under which a hypothesis explains an 
observation. In learning from entailment, hypotheses are clausal theories, observations 
are ground definite clauses, and a hypothesis covers an observation if the hypothesis logi- 
cally entails the observation. In learning from interpretations, hypotheses are clausal the- 

*This problem was called existential entailment in 1231 . 



ories, observations are Herbrand interpretations (ground unit clauses) and a hypothesis 
covers an observation if the observation is a model for the hypothesis. 

In Concept Learning, generalization is traditionally viewed as search through a par- 
tially ordered space of inductive hypotheses |36|. According to this vision, an inductive 
hypothesis in ILP is a clausal theory and the induction of a single clause requires (i) 
structuring, (ii) searching and (iii) bounding the space of clauses |40|. First we focus on 
(i) by clarifying the notion of ordering for clauses. An ordering allows for determining 
which one, between two clauses, is more general than the other. Since partial orders are 
considered, uncomparable pairs of clauses are admitted. Given the usefulness of BK, or- 
ders have been proposed that reckon with it. Among them, generalized subsumption 121 
is of major interest to this chapter: Given two definite clauses C and D standardized apart 
and a definite program /C, we say that C >ic D iff there exists a ground substitution 9 for 
C such that (i) head{C)9 = head{D)a and (ii) JC U body{D)a ^ body{C)6 where a is 
a Skolem substitution for D with respect to {C} U K.. Generalized subsumption is also 
called semantic generality in contrast to other orders which are purely syntactic. In the 
general case, it is undecidable. However, for Datalog it is decidable and admits a least 
general generalization. Once structured, the space of hypotheses can be searched (ii) by 
means of refinement operators. A refinement operator is a function which computes a 
set of specializations or generalizations of a clause according to whether a top-down or 
a bottom-up search is performed. The two kinds of refinement operator have been there- 
fore called downward and upward, respectively. The definition of refinement operators 
presupposes the investigation of the properties of the various orderings and is usually 
coupled with the specification of a declarative bias for bounding the space of clauses 
(iii). Bias concerns anything which constrains the search for theories, e.g. a language 
bias specifies syntactic constraints such as linkedness and connectedness on the clauses 
in the search space. A definite clause C is linked if each literal G C is linked. A literal 
G C is linked if at least one of its terms is linked. A term t in some literal e C 
is linked with linking-chain of length 0, if t occurs in head{C), and with linking-chain 
of length d + 1, if some other term in li is linked with linking-chain of length d. The 
link-depth of a term t in k is the length of the shortest linking-chain of t. A clause C is 
connected if each variable occurring in head{C) also occurs in hody{C). 

2. ILP for Onto-Relational Rule Learning: State of the Art 

Hybrid KR systems combining DLs and CLs with a tight integration scheme have very 
recently attracted some attention in the ILP community: 1471 chooses CARlN-ACAf, ll24ll 
resorts to AC-LOG, and (TT) builds upon SHIQ+LOG. A comparative analysis of the 
three is reported in Table |2] They can be considered as attempts at accommodating on- 
tologies in ILP. Indeed, they can deal with ACM, ACC, and SHIQ ontologies respec- 
tively. We remind the reader that ACM and ACC are incomparable DLs whereas DLs in 
the SH family enrich ACC with further constructors. 

Closely related to KR systems integrating DLs and CLs are the hybrid formalims 
arising from the study of many-sorted logics, where a FOL language is combined with 
a sort language which can be regarded as an elementary DL |fT3]| . In this respect the 
study of a sorted downward refinement llT4l can be also considered as a contribution to 
the problem of interest to this chapter. Finally, some work has been done on discovering 
frequent association patterns in the form of DL-safe rules ll20l . 



2.1. Learning Carin-^£7V rules 

The framework proposed in pTl focuses on discriminant induction and adopts the ILP 
setting of learning from interpretations. Hypotheses are represented as Carin-^>C7V^ 
non-recursive rules with a Horn literal in the head that plays the role of target concept. 
The coverage relation of hypotheses against examples adapts the usual one in learning 
from interpretations to the case of hybrid CKRm-ACN BK. The generality relation be- 
tween two hypotheses is defined as an extension of generalized subsumption. Procedures 
for testing both the coverage relation and the generality relation are based on the exis- 
tential entailment algorithm of Carin. Following [47 1, Kietz studies the learnability of 
CAKm- ACN , thus providing a pre-processing method which enables ILP systems to 
learn Carin-^/ITV rules El]. 

2.2. Learning AC-WG rules 

In ll24l . hypotheses are represented as constrained Datalog clauses that are linked, con- 
nected (or range-restricted), and compliant with the bias of Object Identity (Olfl Unlike 
BtI . this framework is general, meaning that it is valid whatever the scope of induction 
is. The generality relation for one such hypothesis language is an adaptation of general- 
ized subsumption, named S-subsumption, to the AC-WG KH framework. It gives raise 
to a quasi-order and can be checked with a decidable procedure based on constrained 
SLD-resolution [30|. Coverage relations for both ILP settings of learning from interpre- 
tations and learning from entailment have been defined on the basis of query answering 
in AL-1.0G ll26l . As opposed to |47|, the framework has been implemented in an ILP 
system 0321331 . More precisely, an instantiation of it for the case of characteristic in- 
duction from interpretations has been considered. Indeed, the system supports a variant 
of a very popular data mining task - frequent pattern discovery - where rich prior con- 
ceptual knowledge is taken into account during the discovery process in order to find 
patterns at multiple levels of description granularity. The search through the space of 
patterns represented as unary conjunctive queries in AC-WG and organized according 
to S-subsumption is performed by applying an ideal downward refinement operator li3T1 . 

2.3. Learning SHIQ+hOG rules 

The ILP framework presented in ll27l represents hypotheses as SHTQ+LOG rules and 
organizes them according to a generality ordering inspired by generalized subsumption. 
The resulting hypothesis space can be searched by means of refinement operators either 
top-down or bottom-up. Analogously to ll24l . this framework encompasses both scopes 
of induction but, differently from f24\, it assumes the ILP setting of learning from en- 
tailment only. Both the coverage relation and the generality relation boil down to query 
answering in SHIQ+LOG, thus can be reformulated as satisfiability problems. Com- 
pared to f47 | and |24|, this framework shows an added value which can be summarized 
as follows. First, it relies on a more expressive DL, i.e. SUXQ. Second, it allows for 
inducing definitions for new DL concepts, i.e. rules with a SHIQ literal in the head. 

'The OI bias can be considered as an extension of the UNA from the semantic level to the syntactic one 
of AC-LOG. It can be the starting point for the definition of either an equational theory or a quasi-order for 
constrained DATALOG clauses. 



Table 2. Three ILP frameworks suitable for learning onto-relational rules. 





Learning Carin-^£A^ rules Wf\ 


Learning AC-LOG rules HH 


Learning STiXQ+LOG rules UtI 


prior knowledge 
ontology language 
rule language 
hypothesis language 
target predicate 


Carin- ACN' KB 

ACAf 

HCL 

CARlN'ACAf non-recursive rules 
Horn predicate 


AC-LOG KB 

ACC 

Datalog 

AC-LOG non-recursive rules 
Datalog predicate 


SnXQ+LOG KB 

snxQ 

Datalog 

*SHXQ+LOG non-recursive rules 
5?^XQ/Datalog predicate 


logical setting 
scope of induction 


interpretations 
prediction 


interpretations/entailment 
prediction/description 


entailment 

prediction/description 


generality order 
coverage test 
ref. operators 


extension of (2] to Carin-ACN' 

Carin query answering 

n.a. 


extension of [2] to AC-LOG 
AC-LOG query answering 
downward 


extension of [2] to STiXQ+LOG 
"DC+LOG^"^ query answering 
downwai^d/upwai^d 


implementation 
application 


unknown 
no 


yes, see 33 
yes, see j32j 


no 
no 



Third, it adopts a more flexible form of integration between the DL and the CL part, i.e. 
the weakly-safe one. 

The work reported in II29I25I generalizes the results of ||27l to any decidable instan- 
tiation of 2?£-i-LOG^^. The following section illustrates how learning VC+LOG"' rules 
can support the evolution of ontologies. 

3. Learning Rule-based Definitions of VC Concepts and Roles with ILP 

In KE, Ontology Evolution is the timely adaptation of an ontology to changed business 
requirements, to trends in ontology instances and patterns of usage of the ontology-based 
application, as well as the consistent management/propagation of these changes to de- 
pendent elements |50|. As opposed to Ontology Modification, Ontology Evolution must 
preserve the consistency of the ontology. According to [41] one can distinguish between 
conceptual, specification and representation changes. 

In this section we consider the conceptual changes of a VC ontology due to exten- 
sional knowledge (i.e., facts of the instance level of the ontology) previously unknown 
but classified which may become available. In particular, we consider the task of defin- 
ing new concepts or roles which provide the intensional counterpart of such extensional 
knowledge and show how this task can be reformulated as an ORL problem |28 1. For ex- 
ample, the new facts LONER( Joe) , LONER (Mary) , and LONER (Paul) concerning known 
individuals may raise the need for having a definition of the concept LONER in the ontol- 
ogy. One such definition can be learned from these facts together with prior knowledge 
about Joe, Mary and Paul, i.e. facts concerning them and already available in the ontol- 
ogy. A crucial requirement is that the definition must be expressed as a VC formula or 
similar In the following we provide the means for learning rule-based definitions of VC 
concepts/roles in the KR framework of VC+LOG^ . 

3.1. The learning problem 

We assume that a VC ontology E = (T, -4) is integrated with a DATALOG^ database 
n to form a VC+l.OG~' KB B. The problem of inducing rule-based definitions of VC 
concepts/roles that do not occur in B can be formalized as follows. 



Definition 1 Given: 



• a VC+LOG" KB B (background theoryj 

• a DC predicate name p (target predicate) 

• a set £ — £^U£~ of DC assertions that are either true or false for p (exsLmples) 

• a set C ofT>£+LOG^ definitions for p (language of hypotheses) 

the problem of building a rule-based definition ofp is to induce a set Ti <Z C fhypothesisj 
ofDC+LOG^ rules from £ and B such that: 

Completeness Ve G : 7^ covers e w.r.t. B 
Consistency Ve G £^ : % does not cover e w.r.t. B. 

The background theory B in Definition[T]can be split into an intensional part JC {i.e., 
the TBox T plus IDB{\1)) and an extensional part T {i.e., the ABox A plus EDB{\1)). 
Also we denote by Pc{B), Pn{t3), and Pd{B) the sets of concept, role and Catalog 
predicate names occurring in B, respectively. Note that p ^ Pc(S) U Pti{B). 

Example 1 Suppose we have a X'£+LOG~' KB B (adapted from /|45|/) built upon the al- 
phabets Pc{B) = {RICH/1, UNMARRIED/1}, Pn{B) = {VANTS-TO-NARRY/2, LOVES/2], 
and Pu{B) — {famous/1, scientist/1, meets/3} and consisting of the following 
intensional knowledge K,: 

[Al] RICHHUNMARRIEDn 3 VANTS-TO-MARRY- .T 

[A2] WANTS-TO-MARRYH LOVES 

[Rl] RICH(X) ^ famous (X), -^scientist (X) 

[R2] happy (X) ^ famous (X), VANTS-TO-MARRY (Y , X) 

and the following extensional knowledge T: 

UNMARRIED (Mary) 

UNMARRIED(Joe) 

famous (Mary) 

famous (Paul) 

famous (Joe) 

scientist (Joe) 

meets (Mary , Paul , Italy) 

meets (Mary, Joe, Germany) 

meets ( Joe, Mary , Italy) 

that concerns the individuals Mary, Joe, Paul, Italy, and Germany. 

The hypothesis language C in Definition[T]is given as a set of declarative bias con- 
straints. It allows for the generation of 2?£+LOG" rules starting from three disjoint al- 
phabets Pc{C) C Pc{B), Pn{C) C Pn{B), and Pd{C) C Pr,{B). Also we distinguish 
between P^ (£) and P^ {£) in order to specify which Datalog predicates can occur in 
positive and negative literals, respectively. More precisely, we consider 2?£h-log^ rules 
of the form 



p{X) ^ ri(yi), . . . , r^iY^n), si(^i), . . . , SkiZ,), . . . , (1) 



where m,k,q > 0, p{X) and each rj{Yj), si{Zi), ut{Wt) is an atom with rj E Pj^{C), 
si e Pc{^) U PiziC), and ut G The admissible rules must be compliant with 

the following restrictions: 

DATALOG-safeness every variable occurring in ([T]l must appear in at least one of the 

atoms ri(yi), . . . , si(Zi), . . . , Sh{Zk); 

weak 2?£-safeness every head variable of ^ must appear in at least one of the atoms 

Ti(Xi), . . . ,r„(y„). 

which also guarantee that the conditions of linkedness and connectedness, usually as- 
sumed in ILP, are satisfied. 

Example 2 Suppose that the target predicate is the DC concept LONER, [f C^""^" is de- 
fined over P+iC'"^") U P^iC'"^") U PciC'"""') = {famous/1} U {happy/1} U 
{RICH/1, UNMARRIED/1}, then the following VC+LOG~' rules 

^LOMR ^ Q^^j^ f-^) ^ famous (X) 

f^LONES ^ gjif^j^ ^ famous (X), UNMARRIED (X) 

I^LONER ^ gjf^g ('j^ ^ famous (X), -chappy (X) 

belong to C^""^'^ and represent hypotheses of a definition for LONER. 

Examples Suppose now that the VC role LIKES is the target predicate and the 
set P+iC""'') U PciC''"'') U Ptc(/:'™') = {happy /I, meets/3} U {RICH/1} U 
{LOVES/2, WANTS -TO -MARRY/2} provides the building blocks for the language ^i™^. 
The following 2?£+LOG^ rules 

I^LiKES LIKES (X, r) ^ meets (X, Z, Y) 

I^LiKES LIKES (X, Y) ^ meets (X, Z, Y), happy (X) 

h^"^^ LIKES (X, Y) ^ meets (X, Z, Y), RICH(Z) 

belonging to can be considered hypotheses of a definition for LIKES. 

The set £ of examples in Definition [T] contains assertions of the kind p{ai) where 
p is the target predicate and is a tuple of individuals occurring in the ABox A. Note 
that, when p is a role name, the tuple is a pair < al,af > of individuals. We assume 
B DE ~ 9. However, a possibly incomplete description of each Ci E £ is in B. 

Example 4 With reference to Example^ suppose that the following concept assertions: 

^LOiiES LONER (Mary) 

^LOHER LONER (Joe) 

^LONES LONER (Paul) 

are examples for the target predicate LONER. 

Example 5 With reference to Example^ the following role assertions: 

^LiKES LIKES (Mary, Italy) 

^LiKES LIKES (Mary, Germany) 

^LiKES LIKESdoe, Italy) 



can be assumed as examples for the target predicate LIKES. 



3.2. The ingredients for an ILP solution 



In order to solve the learning problem in hand with the ILP methodological approach , 
the language C of hypotheses needs to be equipped with (i) a coverage relation which 
defines the mappings from C to the set £ of examples, and (ii) a generality order ^ such 
that (£, >:) is a search space. 

The definition of a coverage relation depends on the representation choice for ex- 
amples. The normal ILP setting is the most appropriate to the learning problem in hand 
and can be extended to the VC+LOG^ framework depicted in Definition [T] as follows. 

Definition 2 We say that a rule h ^ C covers (does not cover, resp.) an example = 
p{ai) G £ w.r.t. a background theory B iffB U h \= p{di) (BU h ^ p{oh), resp.). 

Note that the coverage test can be reduced to query answering w.r.t. a 2?£+LOG^^ KB, 
which in turn can be reformulated as a satisfiability problem of the KB. 

Example 6 With reference to Example^and^ the rule ft,^"™ covers the example e^"™ 
because all NM-models for B' — B U h^""^" do satisfy famous (Mary). It covers also 
™ and ™ for analogous reasons. The rule /if™ covers only ™ and ef ™ 
whereas /if™ covers ef ™ and ef ™. 

Example 7 With reference to Example\3\and\5\ the rule h\^^^^ covers the example el^"^^ 
because all NM-models for B' — BUh{-'''^^ do satisfy meets (Mary, Z, Italy). It covers 
also 62^"^^ and e^'"^^ for analogous reasons. The rule /if "^^^ covers only ef^^^ and 
63™^ whereas /if ^"^^ covers only ef ''^^ and ef 

The definition of a generality order for hypotheses in C must consider the peculiar- 
ities of the chosen C. Generalized subsumption, subsequently extended in |49l to deal 
with NAF literals, is suitable for the problem in hand and can be adapted to the case of 
VC+l^OG^ rules. In the following we provide a characterization of the resulting gener- 
ality order, denoted by >^j^, that relies on the reasoning tasks known for VC+l^OG^^wd 
from which a test procedure can be derived. 

Definition 3 Lef /ii,/i2 G C be two P/I-hlog^ rules standardized apart, K, a 
T>C+hOG~' KB, and a a Skolem substitution for /12 with respect to {hi} U JC. We 
say that hi is more general than /12 w.rt. fC, denoted by hi /12, iff there ex- 

ists a ground substitution Q for hi such that (i) head(hi)B — head(h2)(J and (ii) 
K, U hody(h-2)a |= body{hi)6. We say that hi is strictly more general than /12 w.rt. JC, 
denoted by hi /12, iff hi /12 and /12 hi. We say that hi is equivalent to /12 
w.r.t. IC, denoted by hi =^ /12, iff hi /12 and h2 hi. 

Example 8 Let us consider the rules reported in Example\2\up to variable renaming: 

j^LOKR ^ gj^^j^ (A) ^ famous (A ) 

/if™ L ONER (X) ^ famous (X), UNMARRIED (X) 

In order to check whether /if™ /if™ holds, let a — {X/ a\ a Skolem substitution 
for /if™ with respect to IC U /if™ and 9 = {A/a} a ground substitution for /if™. 
The condition ( i) is immediately verified. The condition 



(ii) K. U {famous (a), UNMARRIED (a)} \= {famous (a)} 



is a ground query answering problem in VC+LOG^. It can be easily proved that all NM- 
models for /C U {famous (a), UNMARRIED (a)} satisfy famous (a). Thus, it is the case 
that /if™ h'^ /if™. The viceversa does not hold. Also, /if™ /if™ and /if™ 
is incomparable with /if™. 

Example 9 With reference to Example\3\ it can be proved that hY"^^ /ij™^ and 
l^LiKES h^"^^- Conversely, the rules /i^™^ and h""^^ are incomparable. Note that 

j^uKES LIKES (X, Y) ^ meets (X, Z, Y), LOVES (X, Z) 

j^LiKES LIKESiX, Y) ^ meets(X,Z, Y), VANTS-TO-MARRY (X ,Z) 

also belong to h can be proved that h{"^^ >-'^ h^"^^, h^"^^ /i^™^, and 

l^LIKES , uLIKES 

Note that the decidabiUty of >-~i^ follows from the decidability of 2?£+LOG^. Also 
it can be proved that is a quasi-order {i.e., it is a reflexive and transitive relation) for 
VC+l^oC" rules, therefore the space (£, y-'j^) can be searched by refinement operators 
like the following one able to traverse the hypothesis space top down. 

Definition 4 Let C be a VC+hOG^ hypothesis language built out of the three finite and 
disjoint alphabets Pc{jO), Pti{C), and P^{C) U P^{C). We define a downward refine- 
ment operator for {C, such that, for each h € C, the set p°^(/i) contains all 
h' G C that can be obtained from h by applying one of the following refinement rules: 

{AddDataLit_B+) hodyih') = body{h) U {r™+i(y„r+i)} if 

1. r^+i e P+{£) 

^- ' m+1 (Ym+i) ^ body{h) 

3. var{head{h)) C var{body{h')) 

{AddOntoLit_B) body{h') = body{h) U {sfc+i(Zfe+i)} 

1. sfc+i ePc{C)yJPn{C)^ 

2. it does not exist any si{Zi) G body{h) such that s^+i E s; 

3. var{head{h)) C var{body(h')) 

{SpecOntoLit_B) body{h') = {body{h) \ {si{Zi)}) U s'i{Zi) if 

1. s'iePc{C)VJPn{C) 

2. s'l □ si 

{AddDataLit_B-) body{h') = body{h) U {-.^,+1(1^7+1)} 

2. Uq+i{Wq+i) ^body{h) 

3. Wq+i C var{body+ (h)) 



function OR-Foil(S, p, £+, £-, C): % 

1. H:=0 

2. while £+ 7^ do 

3. h {p{X) 4-}; 
4. 

5. while ^%Ao 

6. Q := {/i' e e 

7. := OR-Foil-ChooseBest(Q); 

8. := {e G f-jBU/i h e}; 

9. end while 

10. H:=HU{/i}; 

11. := {e e f+IBU/i h e}; 

12. £+ := £+ \ £+ 

13. endwhile 

14. return H 

Figure 1. OR-FOIL: A FoiL-like algorithm for learning onto-relational rules 

All the rules of are correct, i.e. the /i"s obtained by applying any of the rules of 
p'^^ X.O h £ C are such that h h' . This can be proved intuitively by observing that 
they act only on body{h). Thus condition (i) of Definition |3]is satisfied. Furthermore, it 
is straightforward to notice that the application of any of the rules of p'^^ to h reduces 
the number of models of h. In particular, as for {SpecOntoLit_B) , this intuition follows 
from the semantics of DLs. So condition (ii) also is fulfilled. 

Example 10 With reference to Example^ applying {AddDataLit_B^) to 

f^LOKR LONER (X)^ 

produces h^""^" which can be further specialized by means of {AddOntoLit_B) and 
{AddDataLit_B^ ). Note that no other refinement rule can be applied to h^""^" and 
that h^""" and hf^""" are among the refinements ofhl""^". 

Example 11 With reference to Example\3\ applying {AddDataLit_B~^) to 

/i^™^ LIKES (X,Y)^ 

produces h\"^^ which can be further specialized into h^^"^^, /ig"^^ h^^"^^ and h^^"^^ 
by means of {AddDataLit_B) and {AddOntoLit_B). Note that no other refinement 
rule can be applied to h^^"^^ and that h^^"^^ can be also obtained as refinement from 
hi^"^^ via {SpecOntoLit_B). 

3.3. An ILP algorithm 

The ingredients identified in the previous section are the starting point for the definition 
of ILP algorithms. Figure [T]reports the main procedure of a FoiL-like algorithm, named 
OR-FoiL, for learning onto-relational rules. In OR-FoiL, analogously to Foill"!, the 



'Foil is a popular ILP algorithm for learning sets of rules to be used as a classifier 1421 . 



outer loop (steps 2-12) corresponds to a variant of the sequential covering algorithm, i.e., 
it learns new rules one at a time, removing the positive examples covered by the latest 
rule before attempting to learn the next rule (steps 11-12). The hypothesis space search 
performed by OR-FoiL is best understood by viewing it hierarchically. Each iteration 
through the outer loop (steps 2-13) adds a new rule to its disjunctive hypothesis TL. The 
effect of each new rule is to generate the current disjunctive hypothesis (i.e., to increase 
the number of instances it classifies as positive), by adding a new disjunct. Viewed at 
this level, the search is a bottom-up search through the space of hypotheses, beginning 
with the most specific empty disjunction (step 1) and terminating when the hypothesis 
is sufficiently general to cover all positive training examples (step 13). The inner loop 
(steps 5-9) performs a more fine-grained search to determine the exact definition of each 
new rule. This loop searches a second hypothesis space, consisting of conjunctions of 
literals, to find a conjunction that will form the body of the new rule. Within this space, 
it conducts a top-down, hill-climbing search, beginning with the most general precondi- 
tions possible (step 3), then refining the rule (step 6) until it avoids all negative examples. 
To select the most promising specialization from the candidates generated at each itera- 
tion, OR-Foil-ChooseBest (called at step 7) considers the performance of each can- 
didate over £ and chooses the one which maximizes the information gain. This measure 
is computed according to the following formula 

GAlN(/i', h)^p* {log2{cf{h')) - log2{cf{h))) , (2) 

where p is the number of distinct variable bindings with which positive examples covered 
by the rule h are still covered by h' and c/() is the confidence degree. Thus, the gain 
is positive iff h' is more informative in the sense of Shannon's information theory (i.e. 
iff the confidence degree increases). If there are some literals to add which increase the 
confidence degree, the information gain tends to favor the Uterals that offer the best 
compromise between the confidence degree and the number of examples covered. 

One may think to use the confidence degree defined for P/^-FoiL (see Chapter ?? for 
more details) which takes OWA into account. Indeed, many individuals may be available 
which can not be classified as instances of the target concept nor of its negation. This 
requires a different setting able to deal with unlabeled individuals. 

Example 12 With reference to Example \10\ and Example^ we suppose that 

£- = {el"'"} ' 

The outer loop of OR-FOIL starts from hQ""'" which is further refined through the iter- 
ations of the inner loop, more precisely it is first specialized into h^""'" which in turn, 
since it covers negative examples, is then specialized into h^""'" and h^""'" out of which 
the rule h^""'" is added to Ji'-""^" the hypothesis because it does not cover negative ex- 
amples. At this point the algorithm stops because 7^^"^^^ covers both positive examples. 

Example 13 Following Example\TT\and Example^ we assume that £^ — {e^™'', 63™''} 
and £^ = {63™^}. At the end of the first iteration, /ig™^ is included into 7^^"^^ since 
it does not cover negative examples but only one positive example. 



4. Final Remarks and Directions of Research 

Building rules within ontologies poses several challenges not only to KR researchers 
investigating suitable hybrid DL-CL formalisms but also to the ML community which 
has been historically interested in application areas where the knowledge acquisition 
bottleneck is particularly severe. In particular, ORL may open up new opportunities for 
KE because it will make systems available to support the knowledge engineer in her 
most demanding task, i.e. defining rules that extend or complement an ontology. Thus, 
ORL may produce time and cost savings in KE. In this chapter, we have revised the ML 
literature addressing the problem of learning onto-relational rules. Very few ILP works 
have been found that propose a solution to this problem II47I24I27 1 . They adopt Carin- 
ACAf, AC-LOG and SHIQ+LOG as KR framework, respectively. Note that matching 
Table |2] against Table [T] one may figure out what is the state-of-the-art and what are 
the directions of research on onto-relational rules from the ML viewpoint. Also he/she 
can get suggestions on what is the most appropriate among these ILP frameworks to 
be implemented for a certain intended application. The specific solution illustrated in 
Section |3] takes advantage from an augmented expressive power thanks to the chosen 
2?£h-LOG^^ instantiation |^5l. It supports the evolution of ontologies with the creation 
of a concept/role, change operations which both boil down to the addition of new rules 
to the input KB. 

From the comparative analysis of the ILP frameworks reviewed in Section|2] a com- 
mon feature emerges: All proposals resort to Buntine's generalized subsumption and ex- 
tend it in a non-trivial way. This choice is due to the fact that, among the semantic gen- 
erality orders in ILP, generaUzed subsumption applies only to definite clauses, therefore 
suits well the hypothesis language in all three frameworks. Following these guidelines, 
new ILP frameworks can be designed to deal with more or differently expressive hybrid 
DL-CL languages according to the DL chosen (e.g., learning Carin- ACCJVTl rules), 
or the clausal language chosen (e.g., learning recursive Carin rules), or the integration 
scheme [e.g., learning Carin rules with 2?£-literals in the head). An important require- 
ment will be the definition of a semantic generality relation for hypotheses to take into 
account the background knowledge. Of course, generalized subsumption may turn out to 
be not suitable for all cases, e.g. for the case of learning VC+LOG^^ rules f25 |. Also it 
would be interesting to investigate how the nature of rules (i.e., the intended context of 
usage) may impact the learning process as for the scope of induction and other variables 
in the learning problem statement. For example, the problem of learning AC-LOG rules 
for classification purposes differ greatly from the apparently similar learning problem 
faced in [32|. Finally, it is worthy to consider hybrid KR formalisms with loose and full 
integration scheme. 

Besides theoretical issues, most future work will have to be devoted to implementa- 
tion and application. When moving to practice, issues like efficiency and scalability be- 
come of paramount importance. These concerns may drive the attention of ILP research 
towards less expressive hybrid KR frameworks in order to gain in tractability, e.g. in- 
stantiations of VC+LOG^"^ with DL-Lite |4|. Applications can come out of some of the 
many use cases for Semantic Web rules specified by the RJF W3C Working Group. 
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